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Abstract 

Introduction: Clinical practice guidelines can improve healthcare processes and patient outcomes, but are often of 
low quality. Guideline appraisal tools aim to help potential guideline users in assessing guideline quality. We 
conducted a systematic review of publications describing guideline appraisal tools in order to identify and compare 
existing tools. 

Methods: Among others we searched MEDLINE, EMBASE and the Cochrane Database of Systematic Reviews from 
1995 to May 2011 for relevant primary and secondary publications. We also handsearched the reference lists of 
relevant publications. 

On the basis of the available literature we firstly generated 34 items to be used in the comparison of appraisal tools 
and grouped them into thirteen quality dimensions. We then extracted formal characteristics as well as questions and 
statements of the appraisal tools and assigned them to the items. 

Results: We identified 40 different appraisal tools. They covered between three and thirteen of the thirteen possible 
quality dimensions and between three and 29 of the possible 34 items. The main focus of the appraisal tools were 
the quality dimensions "evaluation of evidence" (mentioned in 35 tools; 88%), "presentation of guideline content" (34 
tools; 85%), "transferability" (33 tools; 83%), "independence" (32 tools; 80%), "scope" (30 tools; 75%), and 
"information retrieval" (29 tools; 73%). The quality dimensions "consideration of different perspectives" and 
"dissemination, implementation and evaluation of the guideline" were covered by only twenty (50%) and eighteen 
tools (45%) respectively. 

Conclusions: Most guideline appraisal tools assess whether the literature search and the evaluation, synthesis and 
presentation of the evidence in guidelines follow the principles of evidence-based medicine. Although conflicts of 
interest and norms and values of guideline developers, as well as patient involvement, affect the trustworthiness of 
guidelines, they are currently insufficiently considered. Greater focus should be placed on these issues in the further 
development of guideline appraisal tools. 
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Introduction 

Clinical practice guidelines (hereafter referred to as 
"guidelines") are defined by the Institute of Medicine as 
"statements that Include recommendations Intended to optimize 
patient care that are Informed by a systematic review of 
evidence and an assessment of the benefits and harms of 
alternative care options" [1]. Beyond that, guidelines are used 
for a variety of purposes, for example, as a means to measure 
and Improve the quality of care, to resolve malpractice claims, 
to contribute to the development of clinical decision aids or to 



support policy makers In the allocation of healthcare resources 
[1]. 

There Is evidence to suggest that, when rigorously 
developed, guidelines have the power to translate the 
complexity of scientific research findings and other evidence 
Into recommendations for healthcare action [2-5]. 

Several studies have shown that guidelines can improve 
healthcare processes and patient outcomes. Grimshaw, 
Eccles, and Tetroe 2004 conducted a systematic review of the 
effectiveness and costs of various guideline development, 
dissemination and Implementation strategies. The majority 
(86.6%) of the 235 studies Included In their review reported 
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improvements In health care [6,7]. Two other systematic 
reviews reported similar results [8,9]. However, all of the 
authors noted that the studies included were of low 
methodological quality. 

The AGREE Collaboration defines guideline quality as "the 
confidence that the potential biases of guideline development 
have been addressed adequately and that the 
recommendations are both internally and externally valid, and 
are feasible for practice" [10], This definition has been widely 
adopted in the scientific literature [1 1,12]. 

Studies investigating the methodological quality of guidelines 
have often reported low quality and no, or only modest, 
improvement in quality over time [13-17]. 

Potential deficits of guidelines include: 

conflicting recommendations [18-26], 

insufficient consideration of relevant patient characteristics 
(e.g., multimorbidity or ethnic differences) [27-30], 

low quality of the evidence underlying the 
recommendations [31-35], 

lack of transparency of methods applied by guideline 
developers, especially concerning the derivation of 
recommendations and the determination of their strength [1], 

inadequate management of potential conflicts of interest 
[36-41]. 

Several groups, such as the Guidelines International 
Network [42], the Institute of Medicine [1], the World Health 
Organization [43], the National Institute for Health and Clinical 
Excellence [44], the Scottish Intercollegiate Guidelines Network 
[45], many medical societies [46-51], as well as individual 
experts in the field [12,52-55], have proposed manuals defining 
standards for guideline developers in order to increase 
guideline quality. Overall, these manuals address the following 
key elements in the development process: establishment of a 
multidisciplinary guideline development group, consumer 
involvement, identification of clinical questions or problems, 
conduct of systematic searches and appraisal of the evidence 
retrieved, procedures for drafting recommendations, external 
consultation, and ongoing reviewing and updating [56]. 

Parallel to the production of manuals for the development of 
high-quality guidelines, tools for their appraisal have been 
developed. These tools aim to help potential guideline users to 
assess guideline quality. The AGREE II Instrument - the 
guideline appraisal tool used most often internationally - 
contains questions covering the areas (1) scope and purpose, 
(2) stakeholder involvement, (3) rigour of development, (4) 
clarity of presentation, (5) applicability, and (6) editorial 
independence [57]. 

Graham 2000 identified and compared guideline appraisal 
tools in a systematic review [58], which was updated by Vlayen 
in 2005 [59]. Vlayen identified 24 different tools containing 
questions that could be grouped into ten quality dimensions 
with 50 different items. Four of the 24 tools covered all of the 
guideline dimensions, but only four were validated and none 
assessed the evidence base of the clinical content of the 
guidelines. The authors stated that "the results of the search for 
evidence, the correct use of inclusion and exclusion criteria, 
and the critical appraisal of the retrieved evidence are not 



validated. Therefore, a major conclusion of this review is that in 
order to evaluate the quality of the clinical content and more 
specifically the evidence base of a clinical practice guideline, 
verification of the completeness and the quality of the literature 
search and its analysis has to be added to the process of 
validation by an appraisal instrument." 

The aims of this systematic review were to identify and 
compare existing guideline appraisal tools to see if the 
landscape of tools had changed. This comparison can then be 
used to support decision-making by clinicians, patients and 
policy makers concerning the selection of the most appropriate 
tool, as well as to identify potential for improvement. 

Methods 

We searched for relevant primary and secondary 

publications (systematic and narrative reviews) in MEDLINE, 
EMBASE, the Cochrane Database of Systematic Reviews 
(Cochrane Reviews), the Database of Abstracts of Reviews of 
Effects (Other Reviews), the Health Technology Assessment 
Database (Technology Assessments), the NHS Economic 
Evaluation Database, and the Cochrane Methodology Register. 
The systematic search was limited to publications in German 
and English published after 1994. The search in all databases 
was performed in May 2011. The search strategy included, 
among others, the search terms "guideline", "appraisal", 
"guideline adherence", "quality", "evidence based" and 
"evaluation". The full search strategy, which was developed by 
an information specialist (EH), is attached to this publication as 
online File SI. In addition, we scrutinized the reference lists of 
the relevant primary and secondary publications retrieved in 
the above search to identify further publications. 
We included articles with the following characteristics: 

Publication described the most recent version of an 
appraisal tool for clinical guidelines 

Availability of a full-text document (e.g., journal article or 
internet file). 

Articles were excluded that only described the content of 
guidelines, the guideline development process or the 
application of an appraisal tool already identified in another 
publication. 

Two reviewers (US, WHE) independently screened titles and 
abstracts of the retrieved citations to identify potentially eligible 
primary and secondary publications. The full texts were 
obtained and independently evaluated by the same two 
reviewers. Disagreements were resolved by consensus. 

Since the primary aim of this review was to identify existing 
guideline appraisal tools and to describe and compare their 
formal and content characteristics, no risk of bias assessment 
was conducted for the publications included. 

The content analysis was a two-stage process. The first 
stage involved the generation of items to be used in the 
comparison of appraisal tools by compilation of a list of all 
questions and statements from each of the tools included. 
These were grouped into common questions and statements 
and assigned to an item label. The items were then assigned to 
broader common categories, named quality dimensions, which 
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were largely derived from Cluzeau at al. 1999 [60], Graham 
2000 [58] and Vlayen 2005 [59]. 

The individual steps of the content analysis procedures were 
always conducted by one person (US) and checked by another 
(WHE). Disagreements were resolved by consensus. 

We identified 34 individual items and assigned them to 
thirteen quality dimensions (see Table 1 for detailed 
definitions). 

For the second stage of the analysis, we (US, WHE) 
extracted the following information from each publication: 

(1 ) Formal characteristics of the appraisal tool. 

These included language, the use of existing appraisal tools 

for tool development, number of items and domains, possible 
answers, number of appraisers, calculation of domain scores 
and overall assessment, information on the development and 
validation of the appraisal tool, as well as publication in a 
journal. 

(2) Questions and statements of the appraisal tools. 

One reviewer (US) then assigned the questions and 
statements to the items identified during the first stage of the 
content analysis. A second reviewer (WHE) confirmed this step 
by once again checking the questions of each appraisal tool 
and the items to which they had been assigned. 
Disagreements were resolved by consensus. The numbers of 
quality dimensions and items covered by each appraisal tool 
were then compared. 

The review was not registered in advance, nor has a review 
protocol been published. 

Results 

Selection of publications 

We retrieved 5164 references from bibliographic databases 
and screened 446 full texts. In addition, we retrieved 62 further 
publications from the reference lists of the relevant primary and 

secondary publications. We identified a total of 42 eligible 
publications describing 40 different guideline appraisal tools 
(Figure 1). Excluded publications are listed in online File 82. 
Relevant secondary publications are listed in online File S3. 

Description of Appraisal Tools 

Table 2 shows the main formal characteristics of the 40 
appraisal tools considered. 38 were published in English and 
two in German. 26 named at least one other publication that 
had influenced their developmentand ten named the AGREE 
Instrument [10]; other publications mentioned included those by 
Hayward 1995, Wilson 1995 and Field 1992 [61-63]. 

Eleven appraisal tools provided additional information on 
their development process. The number of questions in the 
tools ranged from three to 51. 23 tools grouped their questions 
into domains. The number of domains ranged from two to 21. 
Eighteen tools contained at least some explanation of their 
questions. 

Twenty tools used no specified scoring system, and twelve 
used a multiple choice answer, mostly a "yes/no" score, with or 
without the options 'not sure' or 'not applicable'. Nine tools 



applied some form of scaling system. Six tools explicitly 
requested additional comments from guideline appraisers. 

Thirteen appraisal tools recommended that guidelines should 
be appraised independently by at least two reviewers. 

The calculation of a quality score for the domains of an 
appraisal tool and a qualitative or quantitative overall 
assessment of the guideline were suggested by five and six 
tools respectively. Only eleven tools had been subject to any 
sort of validation studies and only six of these [13,60,64-67] 
had been validated more thoroughly. All but five appraisal tools 
were published in peer-reviewed journals. 

Content analysis 

Figures 2 and 3 compare the quality dimensions and items 
covered by the appraisal tools analysed. 

The tools varied considerably in terms of the number of 
quality dimensions covered. Ten (25%) covered at least twelve 
quality dimensions with at least one item; eleven (28%) 
covered only six or fewer quality dimensions. 

The appraisal tools also differed in the extent to which each 
quality dimension was covered. Of the 34 possible items the 
number covered by each tool varied between three and 29 
(Figure 2). 

The quality dimensions "evaluation of evidence" (mentioned 
in 35 tools; 88%) and "information retrieval" (29 tools; 73%) 
were a main focus of the appraisal tools. However, the tools 
rarely assessed whether the study results were reported 
correctly in the guidelines and supported the recommendations 
(item "consistency" mentioned in six tools; 15%). 

Another focus was the quality dimension "transferability" (33 
tools; 83%) with the items "costs" (25 tools; 63%) and "barriers 
and facilitators" (23 tools; 58%). However, the tools rarely 
assessed whether patients, interventions and settings in the 
studies underlying the recommendations were comparable to 
those targeted by the recommendations (item "comparability" 
mentioned in eight tools; 20%). 

Further quality dimensions covered by at least 70% of the 
appraisal tools were the dimensions "presentation of guideline 
content" (34 tools; 85%), "independence" (32 tools; 80%), 
"scope" (30 tools; 75%), "updating" (30 tools; 75%), and 
"formulation of recommendations" (28 tools; 70%). The item 
"composition of the guideline development group" in the quality 
dimension "independence" was covered frequently (32 tools; 
80%), whereas few appraisal tools mentioned the item 
"consideration of (potential) conflicts of interest" related to the 
guideline development group (eleven tools; 28%). 

The following two quality dimensions were covered by 50% 
or less of the appraisal tools: firstly, "consideration of different 
perspectives" (20 tools; 50%) with the items "patient 
perspectives" (thirteen tools; 33%), "norms and values" (nine 
tools; 23%), and "expert knowledge" (six tools; 15%), and 
secondly, "dissemination, implementation and evaluation of the 
guideline" (eighteen tools; 45%) (Figure 3). 

A table with the complete content characteristics of the 
guideline appraisal tools is attached as online File S4. 
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Table 1. Quality dimensions and items for guideline appraisal. 





Quality dimensions / Item label 


Definition 


1. Information retrieval 


Health questions and outcomes 


Description of clinical health questions and relevant outcomes of the guideline 


Literature search 


Search for literature and other evidence 


Literature selection 


Criteria used to include and exclude literature and other evidence 


2. Evaluation of evidence 


Grading of evidence 


Grading of the evidence, which may or may not include a statement about the strength of evidence (LoE) 


Consistency between evidence and 
recommendations 


Studies results are reported correctly in the guideline and support the recommendations 


3. Consideration of different perspectives 


Norms and values 


Discussion of influence of norms and values on guideline development 


Expert knowledge 


Evaluation of expert opinion and clinical experience 


Patient perspectives 


Consideration of views and preferences of the target population in the guideline development process 


4. Formulation of recommendations 


Formulation of recommendations 


Methods used in formulating recommendations which may or may not include a statement about the strength of 
recommendations (GoR) 


5. Transferability 


Comparability 


Patients, interventions and settings in the studies were comparable to those targeted by the recommendations 


Costs 


Consideration of resource implications of applying the recommendations 




Description of barriers and facilitators to guideline application {compatibility of guideline with local norms and values; 


Barriers and facilitators 


professional's training, skill, and experience; availability of drugs or technology; local adaptation or modification of the 
guideline) 


6. Presentation of guideline content 


Benefits and harms 


Presentation of health benefits, side effects, and harms of the recommended action 


Link to evidence 


Explicit link between the recommendations and the supporting evidence 


7. Alternatives 


Options for management 


Presentation of alternative options for management of the condition or health issues 


Exceptions 


Description of situations in which guidelines may not apply 


Patient preferences 


Consideration of patient preferences in the application of guideline recommendations 


8. Reliability 


Independent Review 


External peer review before publication 


Pilot test 


Pilot test of the guideline prior to release 


9. Scope 


Rationale and objective 


Description of the rationale or reason for guideline development and description of the goal or objective of the guideline 


Guideline topic 


Topic, or health problem, or technology dealt with 


Practice setting 


Practice setting for which the guideline is intended 


Patient population 


Patient population for whom the guideline is intended 


Provider population 


Group of health care providers for whom the guideline is intended 


10. Independence 


Guideline development group 


Individuals and/or disciplines, or occupations represented in the guideline development group and their function in the 
group 


Guideline development organization and 
funding 


Organization or group who developed the guideline and sources of funding 


Conflicts of interest 


Consideration of (potential) conflicts of interest related to the individuals developing the guideline 


11. Clarity and presentation 


Clarity 


Clear wording of the guideline and the recommendations 


Presentation 


Easily identifiable recommendations {e. g., summarized in a box, bold text, underlined). Graphical description of the stages 
and decisions in clinical care {clinical algorithm). 


12. Updating 


Currentness 


Currentness of the evidence of the guideline 

Date of issue of guideline and or date guideline becomes invalid 


Scheduled review 


Procedure for updating the guideline 


13. Dissemination, Implementation, Evaluation 


Dissemination 


Distribution of the guideline to intended users 


Implementation 


Strategies to implement the guideline 
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Table 1 (continued). 





Quality dimensions / Item label 


Definition 


Evaluation 


Evaluation of the guideline and the adherence to the guideline once it has been implemented 



doi: 1 0.1 371/joumal.pone.008291 5.t001 



Full-text articles identified 
in list of references of 
relevant publications 
assessed for eligibility 
n = 62 



Records identified through 
database searching 
Last search 10.05.2011 
n = 5164 



Records screened 
n = 4148 



Full-text articles assessed 
for eligibility 
n = 508 



Relvant publications 
n = 42 

(describing n = 40 appraisal 
tools) 



Figure 1 . Flow chart for selection of appraisal tools. 

doi: 10.1371/joumal.pone.0082915.g001 



Duplicates 
n = 1016 



Records excluded 
n = 3702 



Full-text articles excluded 
n = 466 

Not an appraisal tool n = 91 
(including 2 systematic reviews) 
Tool is not for clinical guidelines 
n = 4 

Tool published before 1995 n = 9 
Not German or English n = 5 
Published in abstract form only n = 24 
Letter to the editor n = 1 
Not available from local libraries, 
interlibrary loan, or author request n = 5 
Multiple publication of the tool n = 6 
Old version of the tool n = 7 
Application of an already identified 
appraisal tool n = 121 
Methods for guideline development 
without criteria for guideline appraisal 
n = 113 

Comparison of guidelines without 
guideline appraisal n = 61 
Application of guideline recommen- 
dations without guideline appraisal 
n = 19 
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Figure 2. Percentage (total number) of quality dimensions / items covered by the guideline appraisal tools. 

doi: 10.1371/joumal.pone.0082915.g002 
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Figure 3. Percentage (total number) of appraisal tools with questions that can be attributed to the respective quality 
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Discussion 
Main findings 

The aim of this systematic review was to identify and 
compare existing guideline appraisal tools. We identified 40 
different tools. Among those were 24 new tools not included in 
the systematic reviews by Graham 2000 [58] and Vlayen 2005 
[59], as well as an additional three updated tools. 

iVlost appraisal tools assess whether the literature search, 
the evaluation and synthesis of the evidence, and the reporting 
of the evidence in the guidelines are in accordance with the 
principles of evidence-based medicine. However, the guideline 
development process comprises more than the systematic 
compilation of the evidence on a relevant clinical question. 
Burgers et al 2002 stated that guideline development is a 
technical as well as social process [68]. The choice and 
interpretation of the evidence identified and the formulation of 
recommendations is affected by norms and values of the 
guideline development group [53,69-74]. Zuiderent-Jerak et al 
2012 suggest that guidelines should reflect all knowledge, not 
just clinical trials [75]. However, few appraisal tools assess 
whether the formulation of recommendations is supported by a 
formal consensus process or whether the norms and values of 
the guideline development group are clearly stated. 

Current standards for guideline development [1 ,42] point out 
that patients should be full members of the guideline 
development group. However, many of the appraisal tools fail 
to capture consumer involvement, i.e. do not assess whether 
patients' views were considered in the guideline development 
group. 

Conflicts of interest may influence decisions in the health 
care system [76,77], also concerning the development of 
guidelines [36-38], and new and more stringent policies have 
been called for [42,55,78-80]. It is therefore surprising that only 
few appraisal tools assess whether conflicts of interest of 
members of the guideline development group have been 
recorded and addressed. 

Selection of an appraisal tool 

Most of the appraisal tools included can be assigned to one 
of three groups: 

1 . Tools with general questions and with no or only a few 
appraisal criteria to decide whether the requirements of the 
questions are fulfilled [61,62,81-96]. 

2. Tools with specific questions or appraisal criteria to 
decide whether the requirements of the questions are 
fulfilled [2, 1 3, 1 4,43,65-67,97-1 06]. 

3. A small group of tools with specific questions and / or 
appraisal criteria with an additional qualitative appraisal 
[57,60,64,107,108]). 

Differing results of guideline appraisals are more likely in 
cases where the questions of an appraisal tool are imprecise or 
specific criteria for answering the questions are lacking. This 
problem is particularly evident in the tools in the first group. For 
this reason the appraisal tools in the first group cannot be 
recommended for regular use. 



It is also important to underline that appraisal tools in the first 
and second group mainly focus on methodological issues 
surrounding guideline development and reporting. However, 
they do not evaluate the quality of the clinical content itself 
[58,109]. For example, guideline appraisal tools in the first and 
second group assess whether the search strategy was 
reported in the guidelines, but they do not assess whether the 
search strategy was developed correctly or whether it was 
suited to identify evidence to answer the clinical question of the 
guideline. 

While rigorous development and explicit reporting of the 
guideline development process are necessary, they do not 
guarantee appropriate recommendations or better health 
outcomes for patients, as the methodological rigour and quality 
of the clinical content of a clinical practice guideline are not 
necessarily correlated [58,1 10-1 12]. 

Only the five tools of the third group are designed to solve 
this problem, at least to some degree. While their main focus is 
still the appraisal of methodological aspects of guideline 
development and reporting, they nevertheless require 
judgments on whether relevant quality aspects have been 
adequately implemented. For example, they assess not only 
whether the search strategy was reported but also require a 
qualitative statement on whether the strategy was appropriate 
[57,60,64,107,108], whether the evidence identified was 
appropriately summarized in the recommendations 
[60,64,107,108] or whether an appropriate formal process was 
used to arrive at the recommendations [57,60]. 

Appraisal tools differ in the number of items and quality 
dimensions covered. If the aim is to conduct a comprehensive 
guideline appraisal, the AGREE II tool [57] or the German- 
language DELBI tool [65] may represent the best choice. Both 
tools cover all thirteen quality dimensions. The AGREE II tool 
has also been thoroughly evaluated. 

However, an appraisal tool containing many quality 
dimensions may not necessarily represent the best choice in all 
cases. If the primary goal is to learn more about the 
applicability of a guideline, the GLIA tool [67] may be more 
suitable. This thoroughly evaluated tool appraises aspects that 
influence the applicability of a guideline. If the goal is to gain 
more information on the quality of the clinical content of a 
guideline, the ADAPTE tool [64] may be more suitable. This 
tool primarily Includes questions that can be assigned to the 
quality dimensions "information retrieval" and "evaluation of 
evidence". It has also been thoroughly evaluated, but demands 
considerable skill on the part of the guideline appraiser. 
Moreover, additional information not available in the guideline 
may be needed to answer the questions in this appraisal tool. 

Depending on the problem being addressed, a tool 
containing only a few, but appropriate questions could be 
adequate. Furthermore, it may sometimes be advisable to omit 
some domains or items of an extensive appraisal tool. 

Information S4 provides details of the items and quality 
dimensions covered by the different appraisal tools. 

Strengths and weaknesses of the review 

Our review provides a comprehensive overview of guideline 
appraisal tools. It nevertheless has a number of limitations. 
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A systematic search for appraisal tools is difficult, as there is 
no appropriate MESH or other term for appraisal tools. 
Because of the large number of appraisal tools used it is 
possible that not all appraisal tools were identified. Due to the 
comprehensive search strategy chosen, which included 
screening the reference lists of relevant primary and secondary 
publications, it is nevertheless unlikely that important and 
commonly used tools were not identified. 

The systematic search for appraisal tools was limited to tools 
published after 1994. In the late 1980s and early 1990s, the 
development of clinical practice guidelines became more 
common. With the definition of clinical practice guidelines by 
Field and Lohr in 1990 [113], a shared understanding of 
guidelines and guideline quality emerged that influenced the 
development of guidelines, as well as the development of 
appraisal tools. Authors of appraisal tools published before 
1995 were probably not able to consider these developments. 

We used the questions and statements contained in the 
appraisal tools, as well as the publications by Cluzeau 1999 
[60], Graham 2000 [58] and Vlayen 2008 [59], to identify items 
and quality dimensions. According to this approach, the result 
of this review is a comparative description of the appraisal 
tools. There is no "gold standard" for the evaluation of appraisal 
tools. It is therefore possible that quality dimensions and items 
exist that were not identified, as they were not part of the 
publications and appraisal tools analysed, but may 
nevertheless be relevant for the appraisal of guideline quality. 
Furthermore, it was not always possible to clearly assign the 
questions or items of the appraisal tools to only one quality 
dimension. A further limitation of our review is that no external 
experts were consulted in the validation of the appraisal 
framework. 

Unanswered Questions and Future Research 

The appraisal tools analysed cover several different aspects 
of guideline quality. All tools allow for the grading of guideline 
quality. However, it is uncertain whether all items and quality 
dimensions contribute equally to the quality of a guideline [58]. 
Further empirical studies are needed to answer the question as 
to which items and quality dimensions are essential for the 
assessment of guideline quality; for example, whether the 
external review of guidelines really improves their quality, 
whether conflicts of interest really lead to inappropriate 
recommendations or whether the explicit consideration of 
patient preferences really improves the patient-centeredness of 
a guideline. 

In 2005 Vlayen stated "that in order to evaluate the quality of 
the clinical content and more specifically the evidence base of 
a clinical practice guideline, verification of the completeness 
and the quality of the literature search and its analysis has to 
be added to the process of validation by an appraisal 
instrument" [59]. Some appraisal tools have started to deal with 
this problem but have not solved it so far. 

The appraisal of the quality of the clinical content of 
guidelines is time-consuming, requires highly qualified 
personnel and may need additional information not available in 
the guidelines themselves. For example, an information 
specialist may be needed for appraisal of the appropriateness 



of a search strategy, it may be necessary to repeat a literature 
search to verify the completeness of the search results or the 
analysis of the literature identified has to be repeated to prove 
its correctness. 

Some working groups have started to deal with the appraisal 
of the clinical content of a guideline [114,115], but it remains 
unclear whether the assessment of the evidence base can be 
included in guideline appraisal tools in their current form. 
Further research will have to clarify whether and how overall 
appraisal of the clinical content of a guideline can be included 
in guideline appraisal tools with a reasonable use of resources. 

Conclusions 

Appraisal tools differ in the number of items and quality 
dimensions covered and some tools cover some quality 
dimensions better than others. The most comprehensively 
validated appraisal tool is the AGREE II instrument, but the 
final choice of the appropriate tool depends on the research 
question. Nevertheless, appraisal tools containing unspecific 
questions and / or lacking criteria for answering the questions 
should not be applied. When choosing an appraisal tool it is 
important to keep in mind that their main focus is the appraisal 
of methodological aspects of guideline development and not 
the evaluation of the evidence base underlying a clinical 
practice guideline; further research should clarify whether and 
how an overall appraisal of the clinical content of a guideline 
can be performed. 

Although conflicts of interest and norms and values of 
guideline developers, as well as patient involvement, affect the 
trustworthiness of guidelines, they are currently insufficiently 
assessed in guideline appraisal tools. They should thus be 
considered essential items in the further development of such 
tools. 
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